Answer with adequate details and clearly cite your references whenever applicable. All work must be done individually.

December 16th, 2015

admin

Question 1 (10 pts) Let our database be:

Employee {EmpId, DeptId, Salary} Department {DeptId, ManagerId, Budget}

Assume that all of table Employee is stored at site_1 and all of Department is stored in site_2. The query:

SELECT *
FROM Employee E, Department D WHERE E. EmpId = D. ManagerId

is issued at site_3. We know that only 1% of the employees are managers.

A. What are the options that you have in terms of processing the query? What other information do you need in order to evaluate the cost of each option in order to make a decision? In order to solve this problem, make some realistic
assumptions for this information that you need and then select what you consider
to be the best (in terms of cost) option for executing the query. Your answer must include all the necessary details

B. After you answer the previous question, give your analysis, opinion, and evaluation of using the prevailing cost models for query processing in a distributed database environment.
Question 2 (10 pts)
Assume that you are assigned the task of leading the process of designing distributed database for a large global search engine. How will you handle this task? Explain your approach, the design process, and the important factors that will impact the design and implementation process. Remember that companies that provide search engines also incorporate the use of advertisements and sponsorship of products and services as well as many other business practices as part of its business model.

One method to ensure privacy of data is to distribute the data into fragments so that in the case of illegal access to the data in one site, the damage is very limited. For example, if someone was able to access data about names and social security numbers together, then the damage can be significant. However, if someone can access the social security numbers without the names, then there is very little concern from security perspective. Another approach is to use encryption, however, this approach should be avoided as much as possible and should be used as a last resource.

A certain company is very concerned about the privacy and security of its information that is stored in table Project. We need to vertically fragment the relation:

Project {project_name, mngr_id, duration, budget, priority}

into 2 fragment. The relation has the following privacy constraints (meaning we cannot store the same attributes in the same fragment):
o {budget, mngr _id }
o {budget, start_date}
o {budget, priority}

What are good vertical fragments, in order to minimize the use of encryption? The fragmentation of data is performed in such a fashion as to ensure that the exposure of the contents of any one database does not result in a violation of privacy. Explain your answer.
Question 4 (5 pts)
Most of the time, the topic of Distributed Databases is discussed in conjunction with Parallel Databases. What are the most important similarities and differences between them? What are the factors that will impact your decision of selecting one versus the other?

Question 5 (10 pts)
Company A acquired Company B. You are assigned the task of leading the process of merging both databases so that all application can access the data as a single database. How will you handle this task? Explain your approach, the design process, and the important factors that will impact the design and implementation process.

Let our database be:

Employee {EmpId, DeptId, Salary, Sex, DOB}
The following statistics are stored in the DBMS’s system tables:
o There are 10000 rows in the table.
o There are 20 departments in the company.
o Sex is either ‘M’ or ‘F’.
o The lowest salary is 20,000 and the highest is 100,000.
o Flat distributions of data values in columns

What is the selectivity of the following minterms:

o SALARY < 35000

o ((SEX= ‘M’ AND DEPT = ‘HR’) OR
(SEX= ‘F’ AND DEPT =’Software’)) AND
(SALARY > 50000 OR SALARY < 25000)
Question 7 (5 pts)
Consider the following three relations (underlined names indicate keys):

o E (eid, ename, salary, location), which contains the ID, name, salary, and location of employees of some company. Only the values “NY” and “LA” appear in the location attribute.
o P(pname, mid, start, budget, priority), containing the name, manager ID, start date, budget, and priority of current projects at the company.
o A(pname, eid), containing the project-employee allocations.

When E is accessed, the following predicates appear in a majority of the queries:
salary < 100; salary  200 location = LA; location = NY

What are the fragments that result from performing primary horizontal fragmentation on
E, using the predicates listed above.

Posted in Uncategorized

Responses are currently closed, but you can trackback from your own site.

Comments are closed.